feat: add vLLM ROCm Docker variant for AMD GPUs#917
Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The vLLM ROCm Docker stage duplicates most of the CUDA vLLM setup; consider factoring the common vLLM/uv/venv installation into a shared intermediate stage or script to keep the two variants in sync more easily.
- In the vLLM ROCm Docker stage, you may want to align version configuration with the rest of the image (e.g., via .versions or shared ARGs) so VLLM_ROCM_VERSION and PYTORCH_ROCM_VERSION don’t silently drift from other ROCm-related components.
- For the ROCm image’s apt install line, consider adding --no-install-recommends to reduce image size and keep the dependency set tighter, especially since this base is already quite large.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The vLLM ROCm Docker stage duplicates most of the CUDA vLLM setup; consider factoring the common vLLM/uv/venv installation into a shared intermediate stage or script to keep the two variants in sync more easily.
- In the vLLM ROCm Docker stage, you may want to align version configuration with the rest of the image (e.g., via .versions or shared ARGs) so VLLM_ROCM_VERSION and PYTORCH_ROCM_VERSION don’t silently drift from other ROCm-related components.
- For the ROCm image’s apt install line, consider adding --no-install-recommends to reduce image size and keep the dependency set tighter, especially since this base is already quite large.
## Individual Comments
### Comment 1
<location path="Dockerfile" line_range="132-138" />
<code_context>
+
+# Install uv and vLLM with ROCm support as modelrunner user.
+# Uses upstream vLLM ROCm wheels (vLLM gained native ROCm support upstream).
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
+ && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
+ && . /opt/vllm-env/bin/activate \
+ && ~/.local/bin/uv pip install vllm \
+ --extra-index-url https://wheels.vllm.ai/${VLLM_VERSION}/${VLLM_ROCM_VERSION} \
+ --extra-index-url https://download.pytorch.org/whl/${PYTORCH_ROCM_VERSION} \
+ --index-strategy unsafe-best-match
+
+RUN /opt/vllm-env/bin/python3.12 -c "import vllm; print(vllm.__version__)" > /opt/vllm-env/version
</code_context>
<issue_to_address>
**issue (bug_risk):** vLLM version arg is not used to pin the installed package, which could desync from the ROCm wheel index path.
`VLLM_VERSION` is only used in the ROCm wheel index URL, while `uv pip install vllm` pulls the latest version from the indexes. If a newer release appears, `uv` may install a different version than the one implied by the ROCm index path, causing resolution errors or mismatched binaries.
Please pin `vllm` to `VLLM_VERSION` so the installed package matches the ROCm wheel index:
```dockerfile
&& ~/.local/bin/uv pip install "vllm==${VLLM_VERSION}" \
--extra-index-url https://wheels.vllm.ai/${VLLM_VERSION}/${VLLM_ROCM_VERSION} \
--extra-index-url https://download.pytorch.org/whl/${PYTORCH_ROCM_VERSION} \
--index-strategy unsafe-best-match
```
</issue_to_address>
### Comment 2
<location path="Dockerfile" line_range="124" />
<code_context>
+
+USER root
+
+RUN apt update && apt install -y python3.12 python3.12-venv python3.12-dev curl ca-certificates build-essential && rm -rf /var/lib/apt/lists/*
+
+RUN mkdir -p /opt/vllm-env && chown -R modelrunner:modelrunner /opt/vllm-env
</code_context>
<issue_to_address>
**suggestion (performance):** Consider using --no-install-recommends to keep the ROCm vLLM image smaller and more deterministic.
You likely only need the explicitly listed packages, so preventing installation of recommended extras will keep this base image lean and more reproducible:
```dockerfile
RUN apt update && apt install -y --no-install-recommends \
python3.12 python3.12-venv python3.12-dev curl ca-certificates build-essential \
&& rm -rf /var/lib/apt/lists/*
```
```suggestion
RUN apt update && apt install -y --no-install-recommends python3.12 python3.12-venv python3.12-dev curl ca-certificates build-essential && rm -rf /var/lib/apt/lists/*
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@ericcurtin yo! look it when you have a free time. Thanks! |
There was a problem hiding this comment.
Code Review
This pull request introduces ROCm (AMD GPU) support for vLLM by adding a new Docker build stage and corresponding Makefile targets. While the infrastructure additions are appropriate, several critical correctness issues were identified in the Dockerfile: the vLLM version is invalid, the ROCm 6.3 specification will cause build failures and runtime library mismatches with the base image, and the wheel index URL is missing a required version prefix.
| # so the resulting image contains the ROCm runtime that vLLM links against. | ||
| FROM llamacpp AS vllm-rocm | ||
|
|
||
| ARG VLLM_VERSION=0.19.1 |
There was a problem hiding this comment.
| ARG VLLM_ROCM_VERSION=rocm63 | ||
| ARG PYTORCH_ROCM_VERSION=rocm6.3 |
There was a problem hiding this comment.
The ROCm versions specified here are problematic for the following reasons:
- PyTorch Index 404: PyTorch does not currently host wheels for ROCm 6.3 on their official index.
https://download.pytorch.org/whl/rocm6.3will return a 404 error, causing the build to fail. The latest stable version isrocm6.2. - Base Image Mismatch: The
llamacppbase image (resolved fromghcr.io/ggml-org/llama.cpp:server-rocm) is built using ROCm 6.2. Installing ROCm 6.3 wheels on top of a ROCm 6.2 runtime will lead to critical failures at runtime due to incompatible shared library versions (e.g.,libamdhip64.so.6).
ARG VLLM_ROCM_VERSION=rocm62
ARG PYTORCH_ROCM_VERSION=rocm6.2
| && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \ | ||
| && . /opt/vllm-env/bin/activate \ | ||
| && ~/.local/bin/uv pip install vllm \ | ||
| --extra-index-url https://wheels.vllm.ai/${VLLM_VERSION}/${VLLM_ROCM_VERSION} \ |
There was a problem hiding this comment.
Adds a new Docker build target (final-vllm-rocm) that builds upstream vLLM from source on top of rocm/vllm-dev:base (AMD's pre-built ROCm dev image containing PyTorch ROCm, Triton, flash-attention, and the ROCm SDK). vLLM is checked out at the tagged release matching VLLM_VERSION — no fork, no custom wheels. The runtime stage mirrors the /opt/vllm-env layout that the existing vLLM backend expects via symlinks, so no Go code changes are needed. Unlike the CUDA vllm variant, the ROCm image does not include llama.cpp: the base images are incompatible (different ROCm runtime versions), so the ROCm vllm image is intended as a vLLM-only artifact. Fixes #412
76db8d3 to
c285e47
Compare
|
Very nice @dmedovich ... You could do the project a huge favour actually... We are lacking people with AMD GPUs, if you could confirm llama.cpp/vLLM works with AMD GPUs that would be great! We are also supposed to automatically detect when your machine has an AMD GPU and just do the right thing in that case, that's another thing, we don't know if it works... Merging on green build |
|
@ericcurtin Will install Docker a bit later and check all three points, will come back with detailed info on each. |
Fixes #412
Adds a
final-vllm-rocmDocker build target that runs vLLM on AMD GPUs using upstream vLLM (no fork, no third-party wheels), per the direction in the issue.Approach
Builds upstream vLLM from source on top of
rocm/vllm-dev:base— the same base image vLLM's owndocker/Dockerfile.rocmuses. That base already ships PyTorch ROCm, Triton ROCm, flash-attention ROCm, and the ROCm SDK pre-compiled by AMD. We just clonevllm@v${VLLM_VERSION}and build the wheel.The runtime stage mirrors the
/opt/vllm-envlayout that the existing vLLM backend in pkg/inference/backends/vllm/vllm.go expects (via symlinks, sincerocm/vllm-dev:baseinstalls Python deps system-wide).New targets
make docker-build-vllm-rocm— build the imagemake docker-run-vllm-rocm— build + runLLAMA_SERVER_VARIANT=rocmis passed so the existing logic restrictsDOCKER_BUILD_PLATFORMStolinux/amd64.Trade-offs to flag
vllmstage builds on top ofllamacppand inherits the llama-server binary. For ROCm,rocm/vllm-dev:baseand the llama.cpp ROCm image use incompatible ROCm runtime / PyTorch versions, sovllm-rocmis a vLLM-only artifact. Happy to revisit if you'd prefer a combined shape.PYTORCH_ROCM_ARCHdefaults togfx90a;gfx942;gfx1100;gfx1101(MI200/MI300 + RDNA3 7900/7800) — what upstream vLLM officially supports. Exposed as a Dockerfile ARG, overridable at build time.ttested
go build ./...,go test ./...,go vet,golangci-lint— cleanmake helpshows the new targetsmake docker-build-vllm-rocmend-to-end. I don't have a supported GPU (only an RX 6600 = gfx1032, not in vLLM's official ROCm targets), so a runtime test wouldn't be informative even if the build succeeded.Would appreciate a sanity check on the architectural choice (vLLM-only ROCm image vs combined). Happy to adjust.
History
Original commit (
76db8d3, force-pushed) installed vLLM via pip fromwheels.vllm.ai/${VLLM_VERSION}/rocm63, but that URL doesn't exist — vLLM only publishes CUDA wheels there. Rewrote to source-build approach above.